o 



Calculating the normalising constant of the Bingham distribution 
on the sphere using the holonomic gradient method 

-.^ ■ Tomonari Sei* and Alfred Kume^ 

(N 

u . 

Qh. Abstract 

■^r ' In this paper we implement the holonomic gradient method to exactly compute the normalising 

constant o f Bingham distri b ution s. This idea is originally applied for general Fisher-Bingham distri- 
butions in iNakavama et alj l|2011f ). In this paper we explicitly apply this algorithm to show the exact 
calculation of the normalising constant; derive explicitly the PfafBan system for this parametric case; 
implement the general approach for the maximum likelihood solution search and finally adjust the 

Cj ' method for degenerate cases, namely when the parameter values have multiplicities. 

v,^ ' Keywords: Bingham distributions, directional statistics, holonomic functions. 

to ■ 1 Introduction 

Let p > 2 and S'p^^ = {x G M^ | x^ x = 1}, the unit sphere in the p-dimensional Euchdean space. Let dx be 
^ I the uniform measure on S^~^ with /cp-i dx = 2^^'"^ /T{p/2). Then modulo an orthogonal transformation 
C^ ' in SP~^, the Bingham distribution has density function with respect to dx on Sp~^ as 

g^; mo)-^^e^'-^'-l (1) 

(^ , where 6 — (6*1, ... , 9p)^ is the parameter and C{6) is the normahsing constant 

C(0)= / e^-i^-^-da;. (2) 

J5P-1 

^^ I Simple arguments confirm that for any c G M 

where 9 + c = {9i + c, . . . ,9p + c)^ . Hence without loss of generality we can assume that 9i can be all 

p ositive. 

Kume and WoodI ( 20051 ) show that C{9) is actually closely related to a particular value of the density of 



a random variable defined as a linear combination of the p independent Xi random variables. In particular, 
using the Laplace -transform inversion argu ments one can show the following one-dimensional representation 
of C{9) is useful (|Kume and WoodI (|2005M 



CiO) = ^ r^^^ = ^^==e-*dt, (3) 



where to is any real number less than —9k for all k. Recall that C(0) = /cp-i dx ~ 27r'°/^/r(p/2). If p = 2, 
the integral in ^ should be interpreted appropriately because it is not integrable in the Lebesgue sense 
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(see Appendix). Note however that for the p — 2 case the normahzing constant is related to that of the 
von Mises-Fisher distribution involving the Bessel function of the first kind. While in general there is not 
a closed form for C{6), its calculation is essential in likelihood estimation of the parameters of Bingham 
distributions . Th e sadd lepoint approximation is shown to work very well in a range of parameter values (see 
iKume and Wood! ( 20051 )). In this paper however, we will exploit the connection between partial derivatives 
of C{9) to implement the theory of differential equations. The basic idea here is that provided that we have 
a well defined curve in the parameter space 9 whose value C{9q) at initial point 9q is accurately known 
then the numerical methods of the differential theory will provide accurate solutions for the end point of 
the curve. In principle, if we can then provide the starting point accurately we will get the end point after 
n umerical routines i mplem entation. This approach has started to be implemented for similar distributions 



Nakavama et al.l (|2011l ). In this paper we will explicitly adopt the theory for the Bingham distribution 



by constructing the relevant Pfaffian equation and using it for deriving numerically the exact solution of 
C{9). 

The paper is organised as follows. In section 2 we provide a quick review of the Holonomic gradient 
methods. In section 3, we provide the necessary calculations for implementing this particular gradient 
method to Bingham distributions including the degenerate cases of multiplicities in the parameters. In 
section 4 we provide the numerical evidence of the method proposed. We then compare it with the saddle 
point approximation and other cases when we know the normalizing constant expression exactly. We 
conclude the paper with some discussion. 



2 Review of the holonomic gradient methods 



In this section we review the fran i ework of the ho l ononi i c gradient methods. S ee iNakavama et al. (12011 K 
Hashiguchi et al.l(|2012h . lSei et al.l(|201ll ). lKovamal(|201lh . lKovama et al.l(|2012al ) and lKovama et al.l (|2oT2bl ) 
for details and further information. 

We consider not only the Bingham distribution but also a general parametric family f{x\9) on the 
sample space X with the parameter 9 = (0i, . . . , 9d)^ ■ The parameter space Q is an open subset of the 
d-dimensional Euclidean space. 

We assume that the density function f{x\9) is an elementary function of 9 and a r-dimensional vector 
G = G{9) satisfying the following PDE: 



d,G{9)^P,{9)G{9), z = l,. 



(4) 



where di denotes d/d9i and Pi{9) is a r x r- matrix of rational functions of 9. The equation (j4|) is called 
the Pfaffian equation of G and plays an essential role in this paper. Typically the vector G consists of the 
normalising constant of f{x\9) and its derivatives. We give an example. 
Example 1. Consider the von Mises-Fisher distribution 



f{x\9,^Ji) = 



1 



C{9) 



Jp.' X 



G{9) = 



e'^'^^'^dx, 



gp-l 



on the unit sphere S^ ^, where > 0, /i G 5^ ^, and dx is the uniform measure. It is known that 
C{9) = (27r)P/^J n/2-i (6*)/0P/^ ~^ and I^ denotes the modified Bessel function of the first kind and order 



u (see p. 168 of iMardia and Jupd (J2000l )). The function C satisfies the following ordinary differential 
equation: 

C"i9) + P—lc'{9) - C{9) = 0. 



Putting 9 — {9i) and G = (C, C") , we have the Pfaffian equation 



di 



Gi 



1 \ G, 
'^ [g. 



The density function is written as an elementary function f{x\d,pi) — e^^ ^ /Gi{6) of 0, /i and G. Note 
that the modified Bessel function itself is not an elementary function. D 

2.1 The HG algorithm 

Assume that a numerical value of the vector GiO^'^'^) at some point 0'°-' G 8 is given. The holonomic 

gradient (HG) algorithm evaluates G{9^^^) at any other point 9^^\ The algorithm is based on the following 

lemma. 

Lemma 1^ Let ^(r), r e [0,1], be a smooth curve in 9 such that 0(0) = e^°'> and 0{l) = O^'^l Put 

G{t) = G{9{t)). Then G{t) is the solution of the ordinary differential equation (ODE) 

±G{r) ^J:'^Pmr))G{rl (5) 

2 — 1 

with the initial condition G(0) = G{e'^°^). In particular, G(l) = G(e'(i)). 
Proof. By differentiation of composite functions, we have 

= ^^P^(0»)G(%)), 

where the last equality uses the Pfaffian equation (U). This proves (O- The initial condition is obvious. D 

The HG algorithm is described as follows. A natural choice of 6 is the segment 6{t) = (1 — t)6'^^'> +t9''^'^ 
connecting 0*^"-' and d'^^\ 

Input e'^"), G{e^°'>) and6'(i). 
Output G(6'(i)). 
Algorithm 

1. Numerically solve the ODE © over re [0, 1]. 

2. Return G(l). 

Note that the standard numerical routines for solving ([5]) are highly accurate and available in most computer 
packages. 

2.2 The discrete-time HGD algorithm 

In the following, we will implement the HG algorithm for maximum likelihood estimation of the param- 
eters for the parametric family f{x\9) (including that of Bingham). Let x{l), . . . ,x{N) be some ob- 
served data and we want to perform MLE based on the parametric family. The log-likelihood function 
£{0) = Eili ^0Sf{xit)\9) is written as £{9) = L{9, G{9)), where L : O x R'' ^ M is an elementary function 
and G satisfying the Pfaffian equation ^. We consider the (naive) Newton-Raphson method: 

Q{k+i) ^ Q{k) _ [Hess^(e'('=))]-i grad^(6'('=)). (6) 

where 



and 

It is expected that the solution O'^^'^ converges to the MLE ^ as fc — > oo. 

The holonomic gradient descent (HGD) algorithm numerically updates G{9^^' ) by the HG algorithm. 
The gradient vector and Hessian matrix of (i{9) at O'^^^ are computed only in terms of O'^^'' and GiO'^^''). In 
fact G is a vector of the normalizing constant and its derivatives which are closely related to derivatives in 
([6]). More explicitly, the following lemma holds. 
Lemma 2. For any i and j in {1, . . . , d}, we have 



ae, ^' ' dGa 

1=1 °- G=G{9) 






a=l 

52] 



E(^.G)^ ^"^ 



de.dGa 

dL 



Y,{{djP,)G + P,PjG)a 
a=l 



dGa 



E (P.GUP.G), 



'dGadGb 

a, 0=1 



G=G{e) 



where {PiG)a is the a-th component of the r- vector PiG and so on. 

Proof. The formulas are obtained by differentiation of composite functions and the PfafBan equation Q . D 

The holonomic gradient descent algorithm is described as follows. We refer to this algorithm as the 
discrete-time HGD algorithm in order to distinguish it from the continuous-time HGD algorithm defined 
in the following section. 

Input The initial point 6l(o) e M'* and G(6i(o)) e W. 

Output The MLE § and G{e). 

Algorithm 

1. Let k = Q. 

2. Compute ^C^+i) by the formula ([6]) via Lemma [2l 

3. ComputeG(6'('=+i))bytheHGalgorithm, i.e., numerically solve the ODE ([5]) with %) = (1-t)6iW + 

4. If the norm || grad£(6')|| is sufficiently small, then return ^^'^^^^ as output. Otherwise, put A; <— fc + 1 
and go to Step 2. 



2.3 The continuous-time HGD algorithm 

We can also consider a continous-time version of the Newton- Raphson scheme. Let £{9) be the log-hkehhood 
function as in the preceding section. Consider the solution 9{t) of the following differential equation: 

^ = —[Ressemr^ grad£(^), < r < 1, (7) 

dr 1 — r 

where the gradient vector and Hessian matrix of £{9) are calculated as in Lemma [H The solution ^(r) of 
([7]) converges to the (local) maximum likelihood estimate ^ as r — >■ 1 because 



by (III). This means 



-^[grad£(0(T))] =Hess£(^(r))^ = _^grad£(%)) 

UT UT 1 — T 



grad£(^(r)) = (1 - r) grad£(0"(O)), (8) 



which converges to as r —t' 1. The update rule for G{t) :— G{9{t)) is the same as the equation dS]). 
The continuous-time version of the holonomic gradient descent is described as follows. 

Input The initial point 9^^^ e M'', G{9^^^) £ W and a sufficiently small number e > 0. 

Output The MLE 9 and G{9). 

Algorithm 

1 Solve the ordinary differential equation ([5]) and ([7]) from T = Otor = l — e, simultaneously, with the 
initial value ^(0) '= 9^"^ and G(0) = G{9^"'>). 

2 Return ^(1 - e) as ^. 

Remark 1. The continuous-time version is eaisier to implement than the discrete-time one. A drawback 
of the continuous version is that the point t = 1 of the equation ([T]) is singular. Therefore the number e 
in the HGD algorithm cannot be very small. Instead, the approximate value 9(1 — s) will be improved by 
one or several steps of the discrete-time scheme ([6]) . 

3 PfafRan equation for the Bingham distribution 

In this section, we consider the Bingham distribution ([2]) again. The Pfaffian equation for the Bingham 
distribution is initially derived for a generic case and then we focus on the singular cases which occur when 
parameter values of 9 have multiplicities. For example, the complex Bingham distribution is a singular 
case. 

3.1 Generic case 

We first derive partial differential equations satisfied by the normalising constant of the Bingham distribu- 
tion on the set {9 \ 9i y^ 9j for i ^ j}. Denote the partial derivative d/d9i by di. 

Lemma 3. The normalising constant of the Bingham distribution satisfies the following set of partial 
differential equations: 

p 
Y^d.G^G, (9) 

i=l 

MC = §^^, l<^<J<p. (10) 



Equ ation (jlOp is known as the Euler-Darboux (or Euler-Poisson-Darboux) equation (e.g. iTakavama 
(|l992l) V 



Proof. Equation (jH]) immediately follows from the definition ^ of C{d). Indeed, since the support of the 
measure dx is S^^^, we have 



P'--UP¥' 



/5P-1 

Equation pU)) is proved as follows. Assume p > 3 for simplicity. Based on (|31) and Lebesgue's convergence 
theorem, the first derivative of C is 

2^z 4_,^ 2(-ft, - t) nLi V-Ok - t 
For i 7^ j, the second derivative is 

27ri Jt„^.ioo 

1 



4(-0.-t)(-0,-t)nLiV^^r^ 

The partial fractional decomposition yields 

1 



e^*dt. 



1/1 1 



2(0, -0j) \2{-0,-t) 2{-0j-t)^ 
Then we obtain (fTO|) . For p = 2, the same equation is derived. D 

We give the Pfaffian equation of C{9). Let 

G = G{e) = {diC,...,dpC). 

Note that C is written as C = X^iLi ^« due to ©. 

Theorem 1. The vector G satisfies the following PDEs. For i and j in {1, . . . ,p}. 



and 



^.'^-c.-gl^^ a^) 



where Y%=ii denotes summation over A:e{l,...,p}\{i}. 



Proof. The equation pT|) follows from the definition of G and the equation pU)). We prove (|12p. By ([S|), 
9iGi is written as 




Since diGk is given by (ITTt . we have (IT^ . D 

For the Bingham distribution, the parameter 9 has redundancy because the following identity holds for 
any real number c: 

f{x\9)^f{x\0~clp), (13) 

where Ip is the p- vector of ones. There are two methods to remove the redundancy. One of them is to 
restrict one of the parameter say dp — 0. Then the derivative dp has to be removed from the Pfaffian 
equation in a proper way. The result is the following corollary. 
Corollary 2. The vector 

G = (Gi, . . . ,Gp) = {C,diC, . . . ,dp^iC)\gp=o 

satisfies the following equations. For i and j in {1, . . . ,p — 1}, 

9.Gi = G.+i, 9.G,-+i = ^;|^ ~ ^^+^ , ^^J, (14) 



and 



a.G.+i - G.+r - ^ 2(0. - ^.) 

G»+i ~ (Gi - X^Li ^^+i) /it;\ 

^^^ ' ^^^^ 

where X^fc^i denotes the sum over fc e {1, . . . ,p — 1} \ {i}. 

Proof. In (HH), the equation 9iGi = Gi+i for 1 < i < p— 1 is obvious from the definition of G. The equation 
for diGj+i is the same as dTTI) . Finally, (|15p is obtained from (|12l) if one notes Gp = Gi — 'YTiZi Gi+i. D 

Another method to remove the redundancy is to impose a penalty factor to the likelihood function like 

f{x\e) = f{x\6)e-'l'\ 
Then the resultant MLE should satisfy 6p = 0. Similarly, if we adopt 

fix\e)^f{x\e)e-«^^+-+<^^y'/^ 

then the resultant MLE satisfies J2^=i ^i — 0- The penalty factor can be considered as the Bayesian prior 
density for 9. 



3.2 Degenerate case 

We now consider the degenerate points of equations (TTTI) and (|12l) . If multiplicities occur, i.e. some Oi = 6j 
for some i j^ j, coincide, then the Pfaffian equation can not be defined because the demonimator in (|11|) 
becomes zero. Connections betwe en such d egenerate cases and those in the generic case are essentially 
higher order derivatives (see e.g. (|Kume and Wood . |2007j)). To deal with such degenerate cases, we give 
the following theorem. 

Theorem 3. Let 9 — {(pi, . . . , ^i, . . . , (pq, . . . , cpg), where {4>j} takes distinct values and (pj appears dj times 
for each j — 1, . . . ,q. Denote the derivative d/dcpj by dij,- . Then we have the following equation of 



For i and j in { 1 , . . . , g} , 



G:={d^,C{B),...,d^^C{e)). 



dj ijfi — ctjCz-i 



%g.- l: :, , ^^J, (16) 



^yvi — vj 



In particular, ii q = p and di = ■ ■ ■ = dp = I, then the equations coincide with (jlip and (J12p . Furthermore, 
G — (Gi, . . . , Gp) — (C, dc/,-^ , • • ■ , c?0p_iC')|,^p=o satisfies the following equations. For i and j in {1, . . . ,p— 1}, 

9,.Gi = G,+i, a^.G^+i^^^p^T^, ^^J, (18) 

2(0, -0j) 

Q /=. _ /=< V^ diGk+1 — dkGi+i 



dkGi+i — di{Gi — X^f^i '-^^+i) 
20, 



(19) 



Proof. These equations are derived in the same way as Theorem [T] using the 1-dimensional representation 
© of G{0). D 

As a corollary of the theorem, we obtain the Pfaffian equation for the complex Bingham distribution. 
The complex Bingham distribution is a distribution on the complex sphere 5*^'"^ = {x G C^ \ x'^x = 1}, 
where a;^ denotes the complex conjugate transpose of the complex vector x. The complex Bingham density 
function with respect to the uniform distribution is 

/,(x|0) = G,(0)-ie^?^i^-l--^ 

where Gc{(t>) is the normalising constant Gc{(t>). It is known that 

and this is a special case of Bingham di stributions where the entries in th e 6 parameter are in pairs i.e. 
di = ■ ■ ■ = dq = 2 (see e.g. iKent I (|l994[) p.472 of iKume and Wood! (|2005[ )V We will compare the exact 
expression above with that of the HG algorithm in the next Section. 

From Theorem [S] the PDF varies depending on how the parameter vector 9 is degenerated, namely 
the multiplicities in 6. However we show below that, at least for the maximum likelihood estimation, the 
multiplicities in 9 do not change since they are driven by the sufficient statistics. Let Si = N~^ J2t=i ^i(^)^j 



i = l,...,p, be the sufficient statistics with respect to the Bingham density. Then the log-hkelkihood 
function is given by 



m = N[yd,s,~ log c{9) 




From the theory of exponential famihes, there exists the unique ML E up to the redimdancy ([13)) if and 
only if the sufficient statistics satisfy Si > for i = 1, . . . ,p (see e.g. iBarndorff-NielsenI ( 19781 ). Corollary 



9.6). Note that if s^ = for some i, then all the data {xk{t)} lies on a common hyperplane Xi = and the 
i-th coordinate can be removed from the analysis. We assume the condition Sj > for all i hereafter. 

We first give a lemma. 
Lemma 4. Assume s = (cri, . . . , ai, . . . , (Tg, . . . , Cq), where <7i appears di times for each i — 1, . . . , g and 
satisfies cri < • • • < Ug. Then the maximum likelihood estimate forms into 9 = (0i, . . . , 0i, . . . , (/)q, . . . , 0q), 
where (fii < ■ ■ ■ < (pq. 

Proof. Let 6 = (6'i)fL]^ be the unique MLE. We first prove that di < 9j for any i < j. Note that Si < Sj 
for any i < j by the assumption. Assume 9i > 9j for some i < j. Then define a permuted vector 9'^ by 
e^ = Oj, 9] = 9, and 9^ = 9k for other fc's. Since C(^'') = C{9) by symmetry, we have 

fc=i fc=i 

^ {§, - 9,){s, - s,) > 0. 

This contradicts to uniqueness of the MLE. Hence 9i < 9j for any i < j. The same argument shows that 
9i = 9j for any i < j such that Si — Sj . Finally, we prove 9i < 9j for any i < j such that Si < Sj . Assume 
9i = 9j for such i and j. Then, by using the likelihood equation, we have 

s, ^ -^^logCi9) ^ ^logCi9) ^ s„ 

which contradicts to the assumption Si < Sj. This completes the proof. D 

By the above lemma and symmetry with respect to indices, the order oi 9i, . . . ,9p coincides with that of 
Si, . . . , Sp including the number of multiplicities. Furthermore, the following theorem states that the order 
is preserved during the HGD algorithm. 

Theorem 4. Assume that the order of 6*^°^ coincides with that of the sufficient statistics s (including 
multiphcities). Then the order of 9{t) is preserved over r G [0, 1), where 9{t) is the update rule ([7]) of the 
continuous-time HGD algorithm. 

Proof. Let 77(6') = gradlogC(6') be the expectation parameter. Then the update rule (|8]), equivalent to (O, 
is written as 

s-77(%)) = (l-r){.- 7^(0(0))}, 

or 

,7(%)) = (l-T)77(e'(")) + rs. (20) 

This is the line segment connecting r]{9^^^) and s. From the argument of exponential families, 9^°^ is 
considered as the MLE when the sufficient statistic is r/(0(°)). Therefore, by Lemma |31 the order of ^f"' 
coincides with 77(0'^'^''). By (PH)) and the assumption on the order of 6*^°'', we deduce that the order of 7y(^(T)) 
is preserved over r g [0,1). Finally, since 9{t) is considered as the MLE when the sufficient statistic is 
ri{9{T)), the order of 9(t) is the same as r]{9{T)) by Lemma [4] again. This proves the theorem. D 



4 Implementation issues and numerical evidence 

In this section we focus on the performance of our method. Note that there are two numerical procedures 
which need to work here. One is the accurate initial condition of the PDE equation and secondly the 
accuracy of the solution obtained via the PDE machinery. The later is a standard issue in implementation 
of the relevant packages where controllable accuracy is possible. We focus on the first procedure here, that 
of accurate calculation of the initial values. 

4.1 Initial values 

For the HG and HGD algorithms, we need to compute C{9^^'') and its derivatives at an appropriate point 
6^^\ If fl^^-* is sufficiently small, then they are calculated by the following power series expansion with an 
appropriate truncation. 

Let 0{(j), d) = (01, . . . , 01, . . . , (j)q, . . . , (j)q) be a parameter vector, where the multiplicity of (pi is di for each 
i ^ 1, . . . , (7 and therefore T^^-j di — p. Denote (p — ((/>i)i=i ^.nd d — (di)f=i- Then, by using the argument 
of iKume and WoodI (|2007l ). we have 



c{e{^,d)) 



27rP/2 



fci=0 



nLir(#) 
- fci!...fc,!r(ELi(^» + f))' 






The derivatives are calculated by 



alii 






c{e{cp,d)) 



for any index j = (j'l, ■ . ■ , Jq), where \j\ = Y1h=i h (JKume and WoodI (|2007f )). Let 



27rP/2 

CNi0{^,d)) 



uum 



En. 



2 
9 



fci...^^, nLir(fe, + f) 



A:i!...fc,!r(ELi(fc. + f))' 



ki + - + k,<N ^' y-^V^j=n'n I 2 



According to iKovama et al.l ()2012al) , we have 



|C(0(0,d))-Cjv(0(0,rf))| 



\N 



iV+1 



< ^-^^i-n 1 (7(0). (22) 

- N\ iV + l~||0||i ^ ^ ^ ' 

where ||0||i = X]?=i l<^j|- ^^^ Appendix [B] for details. 

Table[T]shows the computed values of G{0) (see Corollary [2] for the definition of G{9)) aXO = {{p—i)/'2.pY^^-^ 
for each dimension p. The truncation number N is selected to make the right hand sides of ([2^ less than 
some required accuracy e. In our numerical examples we have chosen e to be 10~^. 

The power series expansion (|21l) is clearly valid for every 9 while the number of terms needed for the 
series approximation will be heavily dependent on the norm of the parameter 9. Hence, if the norm of 9 
is large, then the truncation number TV that assures the required accuracy is likely to become intolerably 

10 



Table 1: A set of values G{e) normalized by C(0) = 2TrP/'^/r{p/2) is shown, where is {{p - i)/{2p)f^^ 
The last column shows computational time [sec] of the power series. 



p 


G{e)/c{G) 


time 


2 


(1.137579, 


0.604270) 




0.0 


3 


(1.185742, 


0.421987, 


0.394412) 


0.0 


4 


(1.210162, 


0.321833, 


0.308437, 0.295857) 


0.0 


5 


(1.224897, 
0.237834) 


0.259286, 


0.251813, 0.244669, 


0.2 


6 


(1.234745, 
0.203460, 


0.216746, 
0.199319) 


0.212168, 0.207741, 


1.0 


7 


(1.241789, 


0.186029, 


0.183026, 0.180101, 


5.3 




0.177252, 


0.174476, 


0.171771) 




8 


(1.247075, 


0.162847, 


0.160774, 0.158744, 


26.4 




0.156756, 


0.154810, 


0.152903,0.151036) 




9 


(1.251187, 


0.144750, 


0.143260, 0.141795, 


127.5 




0.140356, 


0.138941, 


0.137550, 0.136182, 






0.134837) 








10 


(1.254477, 


0.130242, 


0.129136, 0.128045, 


589.5 




0.126970, 


0.125910, 


0.124866, 0.123836, 






0.122821, 


0.121820) 







large. Both HG and HGD methods are very useful for this situations. Specifically, in order to compute 
the values of CiO) for large 6*, we first compute G(0*^°^) for sufficiently small O'^^'^ and then apply the HG 
method with ^^^^ = 6. To avoid singular points of the Pfaffian system, the order of the components of 6''^°^ 
(including multiplicities) should coincide with those of 0. 

The initial point 9^^^ of the HG algorithm can be arbitrarily selected independent of 9^^^ as long as it 
avoids the singular points. In other words, we can run the HG algorithm for any possible choice of the 
generic parameter 6*^^^, while we keep the initial value fixed at a specific point 9^'^\ Hence for a given p, it 
suffices or the HG algorithm to provide in tabular form the relevant values G{9q) for some fixed 9'^^\ In 
Table [T] we show all necessary values for 2 < p < 10 for the generic case. Note that C{6) — C{tt9) for any 
permutation tt. 

4.2 The HG method 

Table [2] compares computational time of the power series expansion and the HG method. The parameter 
values examined are 9 — {a{p — i)^)^^i for several p, a and b. For the HG method, the initial point is 
selected to 9^°^ = 9/2\\9\\i, which means ||6'(°)||i = 1/2, and the initial value G{9'^°^) is calculated from the 
power series expansion. For each case in Table [21 numerical values of C{9) computed by the two methods 
coincide up to 10~^ whenever the power series method returns the value in a practical time. 

If 9 is so large that the power series method fails, then we examine a logarithmic version of the Pfaffian 
system to confirm numerical accuracy indirectly. Let G^ = {di log C,. . . ,dp log C) = G/G. Then G^ 
satisfies the following non-linear system: 



d.G'^ = {P^G''), - GfG'. 



(23) 



for i,j — 1, . . . ,p, where Pi is the Pfaffian matrix defined by (|11|) and ()12p . This system is numerically 
more stable than that of G because G becomes quite large if 9 is large. Figure[T]is the trajectory of G{9{t)) 
and G^{9{t)) against r G [0,1], respectively, when 9^°^ = (0.4,0.1,0) and e'^^^ = (20,5,0). 
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Table 2: Computational time [sec] of the PS (power series) and HG algorithms. The parameter values 
examined are 9 — {a{p — i) )^^i. The symbol NA means that the PS method did not return an output in 
a practical time. For such cases, the logarithmic version (|23p is used to confirm numerical accuracy. 



p 


a 


b 


ll^lli 


C{0)/C{O) 


PS 


HG 


5 


1/20 


1 


1/2 


1.105961 


0.1 


0.3 


5 


1/10 


1 


1 


1.224897 


0.2 


0.3 


5 


1 


1 


10 


9.769432 


17.1 


0.3 


5 


10 


1 


100 


3.824 X 10" 


NA 


0.3 


5 


1/60 


2 


1/2 


1.106713 


0.1 


0.3 


5 


1 


2 


30 


5.253880 X lO" 


48.6 


0.3 


10 


1/90 


1 


1/2 


1.051360 


14.0 


14.8 


10 


1/45 


1 


1 


1.105546 


49.7 


14.7 


10 


2/45 


1 


2 


1.223062 


386.2 


14.6 


10 


1 


1 


45 


1.757059 X 10^ 


NA 


14.6 


10 


1/570 


2 


1/2 


1.051466 


13.9 


14.1 


10 


1 


2 


285 


3.802 X 10^8 


NA 


15.2 



0.0 



CM 






1 






G1 






--- G2 


1 


O 

+ 

o - 




G3 


/ 


CD _ 






8 ^ 

+ 




^y 















0.2 



0.4 



0.6 



1.0 



(a) Trajectory of G(6'(t)). 




0.0 0.2 0.4 0.6 



tau 



(b) Trajectory of G^(6'(t)). 

Figure 1: Trajectory of G{9{t)) and G^{6{t)) against r e [0, 1], where 9{t) is the line segment connecting 
6'(°) = (0.4,0.1,0) and 0^^^ = (20,5,0). 
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4.3 Comparison with the saddle point approximation 

One method which is used for likelihood inference on Bingham distributions is based on the saddle point 
approximations. The first order saddle point approximation of ([3|) is 



where t^ = i*(^) is the unique solution of 



1 




p . 1 -1/2 



(24) 



fc=i 






= 1 and 



i* < min(— 0fc). 



(25) 



Second order approximations are improvements of the one above (see (|Kume and Woodl . 120051 ) for more 
details). It is shown however, that these improved versions are adequate for many practical applications 
since it takes a very large sample size so that the MLE estimates differ significantly from those of saddle 
point approximations. Another nice feature of the saddle point approximation is that the whole method 
is fast and it involves only a single one dimensional optimization procedure. In Tables |3] and H] we 
compare the second order saddle point approximation of the normalizing constant with HG algorithm. We 
also compare both of these methods in the cases of Complex Bingham distributions, whose normalizing 
constant is known in closed form. As can be seen the HG algorithm performs well and is exact in the 
complex Bingham case. 



Table 3: Columns 2 and 3 in the table compare the saddle point approximations (spa) of 61 = (0,-1, —2, —k) 
with that of hg algorithm; columns 4 and 5 compare the same quantities for = (0, —1, —2, — k, — k) and 
the last three columns compare respectively the saddle point, exact and hg of the complex Bingham with 
parameters (p = (0, —1, —2, — k) 



K 


spa 


hg 


spa 


hg 


spa 


ex 


hg 


5 


4.237006 


4.238950 


3.376766 


3.372017 


5.942975 


5.936835 


5.936835 


10 


2.982628 


2.985576 


1.689684 


1.689355 


3.429004 


3.425468 


3.425468 


30 


1.708766 


1.711919 


0.555494 


0.556123 


1.248280 


1.246421 


1.246421 


50 


1.321178 


1.323994 


0.332102 


0.332661 


0.761347 


0.760180 


0.760180 


100 


0.932895 


0.935094 


0.165587 


0.165940 


0.385272 


0.384675 


0.384675 


200 


0.659185 


0.660814 


0.082676 


0.082871 


0.193779 


0.193477 


0.193477 



4.4 The HGD method 

Table [5] compares computational time of the discrete- and continuous-time HGD methods. The data is 
s = (2i/p{p + l))f^i for each p. The initial point 6''"' = {0{ ,... ,9p ) is selected such that the order of 
(si, . . . , Sp) coincides with the order of 0*^°^ This avoids singularity. The numerical error of the MLE 9 is 
evaluated by maxi<i<p \di \ogC{d) — Si\, which must be zero if 6 is correct. 

Figure [5] shows an example of the trajectories of 6'^*'-' (for discrete algorithm) and 9{t) (for continuous 
algorithm). The data is 

1 2 3 4 5 
15' 15' 15' 15' 15 
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Table 4: Columns 2 and 3 in the table compare the saddle point 
(0, — 1, — 22, — k); columns 4 and 5 compare the same quantities for 9 
last three columns compare respectively the saddle point, exact and hj 
parameters </> = (0, —1, —22, — k) 



approximations (spa) oi = 
= (0,— 1,— 22,— K, — k) and the 
; of the complex Bingham with 



K 


spa 


hg 


spa 


hg 


spa 


ex 


hg 


5 


1.258672 


1.273161 


1.032128 


1.044072 


0.921027 


0.921726 


0.921726 


10 


0.874523 


0.883394 


0.500707 


0.505223 


0.506236 


0.506341 


0.506341 


30 


0.497757 


0.503213 


0.162251 


0.163901 


0.177602 


0.177495 


0.177495 


50 


0.384440 


0.388775 


0.096784 


0.097828 


0.107526 


0.107458 


0.107458 


100 


0.271249 


0.274375 


0.048182 


0.048725 


0.054115 


0.054081 


0.054081 


200 


0.191595 


0.193826 


0.024039 


0.024316 


0.027144 


0.027127 


0.027127 



Table 5: Computational time [sec] and numerical error of the HGD algorithm are shown. The data is 
s = (2i/p(p+l))f^^ for each p. The error is evaluated by max^ \di logC(^) — s^j. Here di \ogC{6) is obtained 
by the power series expansion with accuracy 10~^ for p < 5. Just for information, the error evaluated by 
the HG method is displayed for p > 6. 

p error error time time 

(discrete) (continuous) (discrete) (continuous) 



2 


2.41e-07 


1.04e-08 


0.01 


0.16 


3 


6.53e-07 


1.81e-08 


0.02 


0.21 


4 


5.40e-07 


1.41e-08 


0.05 


0.37 


5 


1.45e-06 


1.78e-08 


0.14 


0.60 


6 


(1.69e-06) 


(1.09e-08) 


0.44 


0.99 


7 


(2.76e-06) 


(1.17e-08) 


1.15 


1.76 


8 


(6.14e-06) 


(1.29e-08) 


2.82 


3.81 


9 


(1.65e-05) 


(2.29e-08) 


6.60 


7.64 


10 


(1.69e-05) 


(2.06e-08) 


14.1 


15.7 



and the MLE computed by the continuous-time HGD algorithm is 

e = (-7.188333, -3.120184, -1.543555, -0.628081, 0). 
The initial parameter is 

4 3 2 1 



qW 



20' 20' 20' 20 



:,0 . 



5 Discussion 

In this paper, we show that it is possible to perform statistical inference on Bingham distributions based 
on the exact maximum likelihood method. This is due to the fact that the normalising constants can 
be calculated accurately using the standard theory of holonomic functions. The only requirement for the 
algorithm to generate the correct value is to start from some exact initial point of the curve along which the 
final solution located. In our examples, the Taylor expansion method can be easily utilized to generate an 
acceptable starting point. For example, as shown in Section 14.21 one starting point could be c{9/r) where 
r is such that the entries of the rescaled vector 6/r are so small so that the Taylor expansion estimation 
can be very accurate at 6/r. Alternatively, one could use the starting points given in Table [TJ While the 
method proposed is rather more computationally demanding than the saddle point approximation, it is in 
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(a) (^i,02)-plane. 
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(b) (?7i,?72)-plane. 

Figure 2: Trajectories oiQ^^'> for discrete algorithm (white circles) and Bir) for continuous algorithm (dashed 
line): (a) (0i, 02)-plane and (b) (771(0), 772(0) )-plane, where r\i{&) = di\ogC{6) denotes the expectation 
parameter. The black circle denotes the MLE. 
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fact very fast in the R package implementations and behaves well even for extreme values of the parameter 
9. We also show how the algorithms can be easily adopted in the degenerate cases of multiplicities in the 
parameter vector 6. 

Appendices 

A One-dimensional representation 

First we briefly describe derivation of the one-dimensional representation (|3]) according to Kume and Wood 
(2005). Note that the parameter Aj in their paper is our —6*,;. Consider p independent normal random 
variables Xi ~ -/V(0, {—20i)~^), where 9i < for all i. Then the marginal density of r = J^^^i ^1 i^ directly 
calculated as 

^^'~ r(p/2) c(o)- ^^' 

On the other hand, the characteristic function of r is 



W=^[e"r^-f,=ny-^. (27, 



In general, the density function is represented by its characteristic function as 



f{r) = — hm / 0(s)e-'^'-"'/2^s (28) 



at arbitrary continuous point of / (see e.g. iFellen (|l97ll )). By combining the equations ([26|) to ([28|) . we 
have 

cm - 2|^/(i) 

life V-t^fc 
= a hm r -^^e—-''-ds. 

Note that this expression holds for any p > 2. If p > 3, then 

27r 7_oc llfc y^Su - IS 

since lOfe^il^^fe ^ is)^^'^| is integrable over (—00,00). By analytic continuation with respect to s, we 
obtain 

^w =^ r 11 / ,\ ■ ^""'"^^ (29) 

for any real number to less than minfc(— 0^). Even if some ^^'s are not negative, the equation (1291) still holds 
due to analytic continuation with respect to 9, as long as ig < ™^k{~9k)- Hence we obtain (jH]). 

B Truncation error of the power series 

We der ive the power series expansion of C{9) and evaluate the truncation error according to iKovama et al.l 



(|2012a[ ). Let 0(0, d) = {(pi, . . . , ^i, . . . , 0g, . . . , (pq) be a parameter vector with multiplicities (di, . . . , dq). 
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Bv lKume and Wood (|2007l ). we have 



c{e{(p,d)) 



Let 



gp-i 

rm V V ^'"■■■^'^ n.r(fc. + rf./2) r(E,rf./2) 



= r(()) V '^i'---'^'^ n»r(fc. + '^»/2) r(E,rf./2) 

■ \,+.^, <^ fci!---fc.! r(E.(fc.+d./2))n.m/2) 



Then the truncation error is evaluated as 

\c{e{ci>,d))-CN{o{e,d))\ 



C(0) 

I0i|'^---l</'.^ ar(fc. + d,/2) r(E,d./2) 



~k,+.^k,>N k,\---k,\ r(E.(fc. + d./2))ar(d,/2) 

fci!---A; I 

fei+---+fc,>Ar « 



< 



E 






n! •■'-^ kil ■ ■ ■ kol 

n=N 
\\±\\N °° ll^lin--'^ 



'-'q\ 



E 



^ nrill Y^ IIV^IIl 



iV! ^^ (iV + 1)"-^ 

TV! iV+l-||0|li' 
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